ENH: Support writing timestamps with timezones with to_sql #22654

mroeschke · 2018-09-10T02:04:06Z

closes ENH/SQL: support writing timestamps with timezone in to_sql #9086
closes to_sql method turns datetime64 index to time zone aware in postgres #23510
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

pep8speaks · 2018-09-10T02:04:10Z

Hello @mroeschke! Thanks for submitting the PR.

There are no PEP8 issues in the file pandas/io/sql.py !
There are no PEP8 issues in the file pandas/tests/io/test_sql.py !

codecov · 2018-09-10T17:39:12Z

Codecov Report

Merging #22654 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #22654   +/-   ##
=======================================
  Coverage   92.17%   92.17%           
=======================================
  Files         169      169           
  Lines       50708    50708           
=======================================
  Hits        46740    46740           
  Misses       3968     3968

Flag	Coverage Δ
#multiple	`90.58% <ø> (ø)`	⬆️
#single	`42.37% <ø> (+0.01%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 73dd6ec...c7c4a7a. Read the comment docs.

jorisvandenbossche

Nice! this was long overdue :-)

jorisvandenbossche · 2018-09-13T09:18:24Z

pandas/tests/io/test_sql.py

+        # GH 9086
+        if self.flavor != 'postgresql':
+            msg = "{} does not support datetime with time zone"
+            pytest.skip(msg.format(self.flavor))


Shouldn't we test assert the behaviour for a database that does not support it? Eg what happens if you write a column with timezone aware data to mysql?
From the sqlalchemy docs, it seems the flag timezone=True is simply ignored, but how are then the datetime values we send it stored? Does that change?

so the values send change:

In [67]: idx = pd.date_range("2012-01-01 09:00", periods=3, tz='Europe/Brussels') In [68]: idx.values.astype('M8[us]').astype(object) Out[68]: array([datetime.datetime(2012, 1, 1, 8, 0), datetime.datetime(2012, 1, 2, 8, 0), datetime.datetime(2012, 1, 3, 8, 0)], dtype=object) In [69]: idx.to_pydatetime() Out[69]: array([ datetime.datetime(2012, 1, 1, 9, 0, tzinfo=<DstTzInfo 'Europe/Brussels' CET+1:00:00 STD>), datetime.datetime(2012, 1, 2, 9, 0, tzinfo=<DstTzInfo 'Europe/Brussels' CET+1:00:00 STD>), datetime.datetime(2012, 1, 3, 9, 0, tzinfo=<DstTzInfo 'Europe/Brussels' CET+1:00:00 STD>)], dtype=object)

but not sure if that changes how the values are then stored in the mysql database (since the utc / naive time is still the same), but so this might be worth testing.

That's a good point. From earlier tests databases that don't support timestamp with timezone read back data as naive (don't know if this is local naive or converted to UTC first and then returned naive though). Will add some tests.

jorisvandenbossche · 2018-09-13T09:21:33Z

pandas/tests/io/test_sql.py

+
    def test_date_parsing(self):
        # No Parsing
-        df = sql.read_sql_table("types_test_data", self.conn)


Shouldn't we rather test that it is not parsed instead of removing it? (but I agree this currently looks like this is not doing much)

Oh I see. My cursory glance thought it was duplicate of the line below; you're right, I will try to add an assert to this result as well

jorisvandenbossche · 2018-09-13T09:25:23Z

pandas/tests/io/test_sql.py

+        df.to_sql('test_datetime_tz', self.conn)
+
+        expected = df.copy()
+        expected['A'] = expected['A'].dt.tz_convert('UTC')


So that I remember correctly: when reading, and we get datetime objects with an offset (if there is a timestamp with timezone), we always convert to utc because we cannot really know what is the timezone? (we get a simple 'fixed offset' ?)

That's correct, so we should expect this round trip test to load US/Pacific and return UTC

mroeschke · 2018-09-18T04:04:50Z

@jorisvandenbossche so looks like based on the passing tests, databases without timezone support store those the data as naive local time.

codecov · 2018-09-19T03:38:20Z

Codecov Report

Merging #22654 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #22654   +/-   ##
=======================================
  Coverage   92.24%   92.24%           
=======================================
  Files         161      161           
  Lines       51224    51224           
=======================================
  Hits        47254    47254           
  Misses       3970     3970

Flag	Coverage Δ
#multiple	`90.63% <ø> (ø)`	⬆️
#single	`42.3% <ø> (+0.02%)`	⬆️

Impacted Files	Coverage Δ
pandas/core/generic.py	`96.81% <ø> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 28a42da...ef3b20f. Read the comment docs.

TomAugspurger · 2018-09-19T15:17:56Z

doc/source/whatsnew/v0.24.0.txt

 - :class:`IntervalIndex` has gained the :meth:`~IntervalIndex.set_closed` method to change the existing ``closed`` value (:issue:`21670`)
 - :func:`~DataFrame.to_csv`, :func:`~Series.to_csv`, :func:`~DataFrame.to_json`, and :func:`~Series.to_json` now support ``compression='infer'`` to infer compression based on filename extension (:issue:`15008`).
  The default compression for ``to_csv``, ``to_json``, and ``to_pickle`` methods has been updated to ``'infer'`` (:issue:`22004`).
+- :func:`to_sql` now supports writing ``TIMESTAMP WITH TIME ZONE`` columns (:issue:`9086`)


Should be

:meth:`DataFrame.to_sql`

jorisvandenbossche · 2018-10-24T18:13:13Z

@mroeschke unless I am mistaken about the consequence of course ..
Because I was now trying to write up an example myself, and I am not so sure any more of what I thought would change.

So above I said the following:

So if one had now a tz-aware column, it was converted to naive UTC, and you got back naive UTC. So the difference is now that you will get back tz-aware UTC ? (and of course that the schema of the sql table now includes 'with time zone')

But so for databases that don't support timezones. Previously we converted to UTC so users got also back naive UTC, while now they will get back naive local?

Can you answer this one explicitly? Is that last paragraph correct? If so, I think we should mention this in the whatsnew (the behavioural change; the fact itself that you get back naive local is already mentioned in the new io.rst section you added)

on which you answered that this was correct.

So therefore I assumed there is a significant change in behaviour, which I don't see reflected in the whatsnew note.

But:

I one has a tz-aware column now, and you try to write it, it actually errors instead of converting to naive UTC as I said above?
And if one previously read from a database that did support timezones, you already got utc data back, so nothing changed here as well?

mroeschke · 2018-10-24T21:09:27Z

So the only significant consequence is described here in the Datetimelike API Changes:

pandas/doc/source/whatsnew/v0.24.0.txt

Line 617 in de62788

    
           - :meth:`DataFrame.to_sql` now writes timezone aware datetime data (``datetime64[ns, tz]`` dtype) as timezone unaware local timestamps instead of timezone unaware UTC timestamps for database dialects that don't support the ``TIMESTAMP WITH TIME ZONE`` type. See the :ref:`io.sql_datetime_data` for implications (:issue:`9086`).

Roundtrips Before:

tz-aware DataFrame --> database that doesn't support timezones --> tz-naive UTC DataFrame
tz-aware DataFrame --> database that does support timezones --> tz-naive UTC DataFrame

Roundtrips After:

tz-aware DataFrame --> database that doesn't support timezones --> tz-naive local DataFrame
tz-aware DataFrame --> database that does support timezones --> tz-aware UTC DataFrame

So addressing your last comment's bullet points:

No error is raised. The data will now be written as naive local instead of naive UTC
Correct. Before, pandas read tz aware data from databases as UTC and that behavior has not changed.

mroeschke · 2018-10-24T21:11:49Z

Yeah given this discussion; it would be wise to highlight this change as its own section in the whatsnew.

mroeschke · 2018-10-25T15:51:12Z

@jorisvandenbossche added a new whatsnew section in API detailing the before and after implications of this change.

jorisvandenbossche · 2018-10-25T17:20:53Z

So yesterday I tried to make a small illustrative example showing the changed round-trip, but did it actually work before?
Because I get:

In [1]: import sqlalchemy

In [3]: engine = sqlalchemy.create_engine('sqlite:///:memory:')

In [4]: df = pd.DataFrame({'a':[1,2,3], 'b': pd.date_range('2016-01-01', periods=3, tz='Europe/Brussels')})

In [5]: df.to_sql('table', engine)
...
TypeError: Cannot cast DatetimeIndex to dtype datetime64[us]

In [6]: pd.__version__
Out[6]: '0.23.4'

So that is my source of confusion. I thought it worked before (and you seemed to confirm it), so therefore I assumed that there is a change in behaviour. But is that actually the case?
Do you have a running example of the cases that you list in the table above?

mroeschke · 2018-10-25T22:44:54Z

Hmm interesting. I made a (bad) assumption that the tests covered this case, but I actually don't see a test case where to_sql is tested with timezones. So the behavior you're seeing is probably correct.

I'll try to confirm tonight on other dbs as well, but I guess it looks like I fixed a bug too :)

mroeschke · 2018-10-26T02:01:57Z

I ran the same test on postgres @jorisvandenbossche and I got the same result.

Therefore you're correct in that there really isn't a change in behavior since previously to_sql and timezone aware datetimes raised an error. I reflected this in a separate whatsnew entry in the io bug section.

mroeschke · 2018-11-07T06:30:59Z

Added a fix for #23510 since it was a pretty easy addition to this PR

TomAugspurger · 2018-11-07T11:57:08Z

Small issue with the new test I think https://travis-ci.org/pandas-dev/pandas/jobs/451739169#L1986

_____ TestMySQLAlchemy.test_naive_datetimeindex_roundtrip[load_iris_data0] _____
self = <pandas.tests.io.test_sql.TestMySQLAlchemy object at 0x7febe71173d0>
    def test_naive_datetimeindex_roundtrip(self):
        # GH 23510
        # Ensure that a naive DatetimeIndex isn't converted to UTC
        dates = date_range('2018-01-01', periods=5, freq='6H')
        expected = DataFrame({'nums': range(5)}, index=dates)
        expected.to_sql('foo_table', self.conn, index_label='info_date')
        result = sql.read_sql_table('foo_table', self.conn
                                    index_col='info_date')
>       tm.assert_frame_equal(result, expected)
E       AssertionError: DataFrame.index are different
E       
E       Attribute "names" are different
E       [left]:  [u'info_date']
E       [right]: [None]

looks good otherwise.

mroeschke · 2018-11-07T19:30:50Z

Thanks @TomAugspurger. Fixed that test (I don't think index name is expected to roundtrip in that test)

jorisvandenbossche · 2018-11-08T14:45:52Z

doc/source/whatsnew/v0.24.0.txt


 .. ipython:: python

   pser = pd.Series(pd.date_range("2000", freq="D", periods=5))


I would move this to the "enhancements" though, I would say that timezones were simply never supported, so it is a nice enhancement that we will now actually support it.

Ah, now looking at the full diff (and not only what changed recently), and see you actually already have that. It's a bit duplicated now, but I am fine with keeping it in both places.

jorisvandenbossche · 2018-11-08T14:48:22Z

pandas/io/sql.py

 from __future__ import division, print_function

 from contextlib import contextmanager
 from datetime import date, datetime, time


just for my understanding: where was this attribute error catched before?

Whoop, sorry, again misunderstanding from not looking at the full diff :-)

jorisvandenbossche · 2018-11-08T14:55:42Z

Lets's finally merge this! @mroeschke thanks for the endurance :-)

…v#22654)

Matt Roeschke added 2 commits September 9, 2018 13:42

ENH: Write timezone columns to SQL

776240b

add tests and change type to Timestamp

befd200

Lint error and comment our skipif

e9f122f

Matt Roeschke added 2 commits September 10, 2018 13:38

Handle DatetimeTZ block

969d2da

Ensure the datetimetz data is 2D first

cc79b90

mroeschke added Enhancement IO SQL to_sql, read_sql, read_sql_query Timezones Timezone data dtype labels Sep 11, 2018

Matt Roeschke added 3 commits September 11, 2018 13:49

Merge remote-tracking branch 'upstream/master' into writing_timezone_sql

24dbaa5

Reading timezones returns timezones in UTC

6e86d58

Add whatsnew and some touchups

c7c4a7a

mroeschke changed the title ~~WIP: Support writing timestamps with timezones with to_sql~~ ENH: Support writing timestamps with timezones with to_sql Sep 12, 2018

mroeschke added this to the 0.24.0 milestone Sep 12, 2018

mroeschke requested a review from jorisvandenbossche September 13, 2018 05:34

jorisvandenbossche reviewed Sep 13, 2018

View reviewed changes

Matt Roeschke added 9 commits September 13, 2018 20:03

Merge remote-tracking branch 'upstream/master' into writing_timezone_sql

6aa4878

Test other dbs

513bbc8

timestamps are actually returned as naive local for myself, sqlite

58772e1

localize -> tz_localize

1a29148

sqlite doesnt support date types

96e9188

type

ded5584

Merge remote-tracking branch 'upstream/master' into writing_timezone_sql

d575089

retest

a7d1b3e

read_table vs read_query sqlite difference

305759c

Add note in the to_sql docs

7a79531

TomAugspurger reviewed Sep 19, 2018

View reviewed changes

Matt Roeschke added 2 commits October 24, 2018 15:22

Merge remote-tracking branch 'upstream/master' into writing_timezone_sql

e940279

Add new section in whatsnew

8c754b5

Matt Roeschke added 2 commits October 25, 2018 18:13

Merge remote-tracking branch 'upstream/master' into writing_timezone_sql

e85842f

Fix whatsnew to reflect prior bug

5af83f7

Matt Roeschke added 4 commits November 5, 2018 22:40

Merge remote-tracking branch 'upstream/master' into writing_timezone_sql

6b3a3f1

Merge remote-tracking branch 'upstream/master' into writing_timezone_sql

c4304ec

handle case when column is datetimeindex

1054fdb

Add new whatsnew entry

f21c755

mroeschke mentioned this pull request Nov 7, 2018

to_sql method turns datetime64 index to time zone aware in postgres #23510

Closed

Matt Roeschke added 2 commits November 7, 2018 09:51

Merge remote-tracking branch 'upstream/master' into writing_timezone_sql

f872ff7

don't check name

ef3b20f

jorisvandenbossche reviewed Nov 8, 2018

View reviewed changes

jorisvandenbossche approved these changes Nov 8, 2018

View reviewed changes

jorisvandenbossche merged commit d0ec813 into pandas-dev:master Nov 8, 2018

mroeschke deleted the writing_timezone_sql branch November 8, 2018 18:02

JustinZhengBC pushed a commit to JustinZhengBC/pandas that referenced this pull request Nov 14, 2018

ENH: Support writing timestamps with timezones with to_sql (pandas-de…

7c0b5d0

…v#22654)

tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018

ENH: Support writing timestamps with timezones with to_sql (pandas-de…

5fa29a3

…v#22654)

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

ENH: Support writing timestamps with timezones with to_sql (pandas-de…

8986e3c

…v#22654)

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

ENH: Support writing timestamps with timezones with to_sql (pandas-de…

2fdc5d2

…v#22654)


		.. ipython:: python

		pser = pd.Series(pd.date_range("2000", freq="D", periods=5))

Uh oh!

ENH: Support writing timestamps with timezones with to_sql #22654

ENH: Support writing timestamps with timezones with to_sql #22654

Uh oh!

Conversation

mroeschke commented Sep 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pep8speaks commented Sep 10, 2018

Uh oh!

codecov bot commented Sep 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mroeschke commented Sep 18, 2018

Uh oh!

codecov bot commented Sep 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche commented Oct 24, 2018

Uh oh!

mroeschke commented Oct 24, 2018

Uh oh!

mroeschke commented Oct 24, 2018

Uh oh!

mroeschke commented Oct 25, 2018

Uh oh!

jorisvandenbossche commented Oct 25, 2018

Uh oh!

mroeschke commented Oct 25, 2018

Uh oh!

mroeschke commented Oct 26, 2018

Uh oh!

mroeschke commented Nov 7, 2018

Uh oh!

TomAugspurger commented Nov 7, 2018

Uh oh!

mroeschke commented Nov 7, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche commented Nov 8, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mroeschke commented Sep 10, 2018 •

edited

Loading

codecov bot commented Sep 10, 2018 •

edited

Loading

codecov bot commented Sep 19, 2018 •

edited

Loading